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NUCLEIC ACID ENCODING A SIGNAL MEDIATOR PROTEIN 
THAT INDUCES CELLULAR MORPHOLOGICAL ALTERATIONS 



Pursuant to 35 U.S.C. §202 (c), it is hereby 
acknowledged that the U.S. Government has certain 
rights in the invention described herein, which was 
made in part with funds from the National Institutes of 
5 Health. 

FIELD OF THE INVENTION 

This invention relates to diagnosis and 
treatment of neoplastic diseases. More specifically, 
10 this invention provides novel nucleic acid molecules, 
proteins and antibodies useful for detection and/or 
regulation of complex signalling events leading to 
morphological and potentially neoplastic cellular 
changes . 

15 

BACKGROUND OF THE INVENTION 

Cellular transformation during the 
development of cancer involves multiple alterations in 
the normal pattern of cell growth regulation. Primary 

20 events in the process of carcinogenesis involve the 
activation of oncogene function by some means (e.g., 
amplification, mutation, chromosomal rearrangement) , 
and in many cases the removal of anti-oncogene 
function. In the most malignant and untreatable 

25 tumors, normal .restraints on cell growth are completely 
lost as transformed cells escape from their primary 
sites and metastasize to other locations in the body. 
One reason for the enhanced growth and invasive 
properties of some tumors may be the acquisition of 

30 increasing numbers of mutations in oncogenes, with 

cumulative effect (Bear et al., Proc. Natl. Acad. Sci. 
USA 86:7495-7499, 1989). Alternatively, insofar as 
oncogenes function through the normal cellular 
signalling pathways required for organismal growth and 
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cellular function (reviewed in McCormick, Nature 
363:15-16, 1993), additional events corresponding to 
mutations or deregulation in the oncogenic signalling 
pathways may also contribute to tumor malignancy (Gilks 
5 et al., Mol. Cell Biol. 13:1759-1768, 1993), even 

though mutations in the signalling pathways alone may 
not cause cancer. 

Several discrete classes of proteins are 
known to be involved in conferring the different types 

10 of changes in cell division properties and morphology 
associated with transformation. These changes can be 
summarized as, first, the promotion of continuous cell 
cycling (immortalization) ; second, the loss of 
responsiveness to growth inhibitory signals and cell 

15 apoptotic signals; and third, the morphological 

restructuring of cells to enhance invasive properties. 

Of these varied mechanisms of oncogene 
action, the role of control of cell morphology is one 
of the least understood. Work using non- transformed 

20 mammalian cells in culture has demonstrated that simply 
altering the shape of a cell can profoundly alter its 
pattern of response to growth signals (DiPersio et al . , 
Mol. Cell Biol. 11:4405-4414, 1991), implying that 
control of cell shape may actually be causative of, 

25 rather than correlative to, cell transformation. For 
example, mutation of the antioncogene NF2 leads to 
development of nervous system tumors. Higher 
eucaryotic proteins involved in promoting aberrant 
morphological changes related to cancer may mediate 

30 additional functions in normal cells that are not 
obviously related to the role they play in cancer 
progression, complicating their identification and 
characterization. Identification and characterization 
of such genes and their encoded proteins would be 

35 beneficial for the development of therapeutic 
strategies in the treatment of malignancies. 

Recent evidence suggests that certain key 
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proteins involved in control of cellular morphology 
contain conserved domains referred to as SH2 and SH3 
domains. These domains consist of non-catalytic 
stretches of approximately 50 amino acids (SH3) and 100 
5 amino acids (SH2, also referred to as the "Src homology 
domain"). SH2/SH3 domains are found in cytoskeletal 
components, such as actin, and are also found in 
signalling proteins such as Abl. The interaction of 
these proteins may play a critical role in organizing 

10 cytoskeleton-membrane attachments. 

Besides the numerous SH2/SH3 containing 
molecules with known catalytic or functional domains, 
there are several signalling molecules, called "adapter 
proteins," which are so small that no conserved domains 

15 seem to exist except SH2 and SH3 domains. Oncoproteins 
such as Nek, Grb2 / Ash/ SEM5 and Crk are representatives 
of this family. The SH2 regions of these oncoproteins 
bind specific phosphotyrosine- containing proteins by 
recognizing a phosphotyrosine in the context of several 

20 adjacent amino acids. Following recognition and 
binding, specific signals are transduced in a 
phosphorylation dependent manner. 

As another example, P4 7v-Crk (CrK) is a 
transforming gene from avian sarcoma virus isolate 

25 CT10. This protein contains one SH2 and one SH3 
domain, and induces an elevation of tyrosine 
phosphorylation on a variety of downstream targets. 
One of these targets, p!30cas, is tightly associated 
with v-Crk. The SH2 domain of v-Crk is required for 

30 this association and subsequent cellular 

transformation. P130cas is also a substrate for Src 
mediated phosphorylation. Judging from its structure, 
pl30cas may function as a "signal assembler" of Src 
family kinases and several cellular SH2- containing 

35 proteins. These proteins bind to the SH2 binding 
domain of pl30cas, which is believed to induce a 
conformational change leading to the activation in 
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inactivation of downstream signals, modulated by 
multiple domains of the protein. 

Another oncogene, Ras, is a member of a large 
evolutionarily conserved superfamily of small GTP- 
5 binding proteins responsible for coordinating specific 
growth factor signals with specific changes in cell 
shape, including the development of stress fibers and 
membrane ruffles (Ridley and Hall, Cell 70:389-399, 
1992; Ridley et al., Cell 70:401-410,1992). A rapidly 

10 growing family of oncoproteins, including Vav, Bcr, 
Ect-2, and Dbl, has been found to be involved in a 
variety of different tumors (Eva and Aaronson, Nature 
316:273-275, 1985; Ron et al . , EMBO J. 7:2465-2473, 
1988; Adams et al . , Oncogene 7:611-618, 1992; Miki et 

15 al., Nature 362:462-465^ 1993). Proteins of this 

family have been shown to interact with Ras/Rac/Rho 
family members, and possess sequence characteristics 
that suggest they too directly associate with and 
modulate organization of the cytoskeleton. 

20 In view of the significant relationship 

between signalling or "adapter" proteins, altered 
cellular morphology and the development of cancer, it 
would be of clear benefit to identify and isolate such 
proteins (or genes encoding them) for the purpose of 

25 developing diagnostic/therapeutic agents for the 

treatment of cancer. It is an object of the present 
invention to provide a purified nucleic acid molecule 
of mammalian origin that encodes a signal mediator 
protein (SMP) involved in the signalling cascade 

30 related to morphological cellular changes, and 

therefrom provide isolated and purified protein. Such 
a gene, when expressed in model systems, such as yeast, 
will provide utility as a research tool for identifying 
genes encoding interacting proteins in the signalling 

35 cascade, thereby facilitating the elucidation of the 
mechanistic action of other genes involved in 
regulating cellular morphology and cell division. The 
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gene may also be used diagnostically to identify 
related genes, and therapeutically in gene augmentation 
or replacement treatments. It is a further object of 
the present invention to provide derivatives of the 
5 SMP-encoding nucleic acid, such as various 

oligonucleotides and nucleic acid fragments for use as 
probes or reagents to analyze the expression of genes 
encoding the proteins. It is a further object of the 
invention to provide the signal mediator protein in 
10 purified form, and to provide antibodies 

immunologically specific for the signal mediator 
protein for the purpose of identifying and quantitating 
this mediator in selected cells and tissues. 

15 SUMMARY OF THE INVENTION 

This invention provides novel biological 
molecules useful for identification, detection and/or 
regulation of complex signalling events that regulate 
cellular morphological changes. According to one 

20 aspect of the present invention, an isolated nucleic 

acid molecule is provided that includes an open reading 
frame encoding a mammalian signal mediator protein of a 
size between about 795 and about 875 amino acids in 
length (preferably about 834 amino acids) . The protein 

25 . comprises an amino- terminal SH3 domain, an internal 
domain that includes a multiplicity of SH2 binding 
motifs, and a carboxy- terminal effector domain. When 
produced in Saccharomyces cerevisiae, the carboxy- 
terminal effector domain is capable of inducing 

30 pseudohyphal budding in the organism under pre- 
determined culture conditions . In a preferred 
embodiment, an isolated nucleic acid molecule is 
provided that includes an open reading frame encoding a 
human mammalian signal mediator protein. In a 

35 particularly preferred embodiment, the human signal 
mediator protein has an amino acid sequence 
substantially the same as Sequence I.D. No. 2. An 
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exemplary nucleic acid molecule of the invention 
comprises Sequence I.D. No. 1. 

According to another aspect of the present 
invention, an isolated nucleic acid molecule is 
5 provided, which has a sequence selected from the group 
consisting of: (1) Sequence I.D. No. 1; (2) a sequence 
hybridizing with part or all of the complementary 
strand of Sequence I.D. No. 1 and encoding a 
polypeptide substantially the same as part or all of a 

10 polypeptide encoded by Sequence I.D. No. 1; and (3) a 
sequence encoding part or all of a polypeptide having 
amino acid Sequence I.D. No. 2. 

According to another aspect of the present 
invention, an isolated nucleic acid molecule is 

15 provided which has a sequence that encodes a carboxy- 

terminal effector domain of a mammalian signal mediator 
protein. This domain has an amino acid sequence of 
greater than 74% similarity to the portion of Sequence 
I.D. No. 2 comprising amino acids 626-834. 

20 According to another aspect of the present 

invention, an isolated mammalian signal mediator 
protein is provided which has a deduced molecular 
weight of between about 100 kDa and 115 kDa (preferably 
about 108 kDa) . The protein comprises an amino- 

25 terminal SH3 domain, an internal domain that includes a 
multiplicity of SH2 binding motifs, and a carboxy- 
terminal effector domain, which is capable of inducing 
pseudohyphal budding in Saccharomyces cerevisiae under 
pre-determined culture conditions, as decribed in 

30 greater detail hereinbelow. In a preferred embodiment 
of the invention, the protein is of human origin, and 
has an amino acid sequence substantially the same as 
Sequence I.D. No . 2 . 

According to another aspect of the present 

35 invention, an isolated mammalian signal mediator 

protein is provided, which comprises a carboxy- terminal 
effector domain having an amino acid sequence of 
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greater than 74% similarity to the portion of Sequence 
I.D. No. 2 comprising amino acids 626-834. In a 
preferred embodiment, the amino acid sequence of the 
carboxy- terminal effector domain is greater than about 
5 50% identical to that portion of Sequence I.D. No. 2. 

According to another aspect of the present 
invention, antibodies immunologically specific for the 
proteins described hereinabove are provided. 

Various terms relating to the biological 

10 molecules of the present invention are used hereinabove 
and also throughout the specifications and claims. The 
terms "substantially the same," "percent similarity" 
and "percent identity (identical)" are defined in 
detail in the description set forth below. 

15 With reference to nucleic acids of the 

invention, the term "isolated nucleic acid" is 
sometimes used. This term, when applied to DNA, refers 
to a DNA molecule that is separated from sequences with 
which it is immediately contiguous (in the 5' and 3' 

20 directions) in the naturally occurring genome of the 
organism from which it was derived. For example, the 
"isolated nucleic acid" may comprise a DNA molecule 
inserted into a vector, such as a plasmid or virus 
vector, or integrated into the genomic DNA of a 

25 procaryote or eucaryote. 

With respect to RNA molecules of the 
invention, the term "isolated nucleic acid" primarily 
refers to an RNA molecule encoded by an isolated DNA 
molecule as defined above. Alternatively, the term may 

30 refer to an RNA molecule that has been sufficiently 
separated from RNA molecules with which it would be 
associated in its natural state (i.e., in cells or 
tissues) , such that it exists in a "substantially pure" 
form (the term "substantially pure" is defined below) . 

35 With respect to protein, the term "isolated 

protein" or "isolated and purified protein" is 
sometimes used herein. This term refers primarily to a 
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protein produced by expression of an isolated nucleic 
acid molecule of the invention. Alternatively, this 
term may refer to a protein which has been sufficiently 
separated from other proteins with which it would 
5 naturally be associated, so as to exist in 
"substantially pure" form. 

The term "substantially pure" refers to a 
preparation comprising at least 50-60% by weight the 
compound of interest (e.g., nucleic acid, 

10 oligonucleotide, protein,' etc.). More preferably, the 
preparation comprises at least 75% by weight, and most 
preferably 90-99% by weight, the compound of interest. 
Purity is measured by methods appropriate for the 
compound of interest (e.g. chromatographic methods, 

15 agarose or polyacrylamide gel electrophoresis, HPLC 
analysis, and the like) . 

With respect to antibodies of the invention, 
the term "immunologically specific" refers to 
antibodies that bind to one or more epitopes of a 

20 protein of interest (e.g. , SMP) , but which do not 

substantially recognize and bind other molecules in a 
sample containing a mixed population of antigenic 
biological molecules. 

With respect to oligonucleotides, the term 

25 "specifically hybridizing" refers to the association 
between two single- stranded nucleotide molecules of 
sufficiently complementary sequence to permit such 
hybridization under pre -determined conditions generally 
used in the art (sometimes termed "substantially 

30 complementary") . In particular, the term refers to 
hybridization of an oligonucleotide with a 
substantially complementary sequence contained within a 
single-stranded DNA or RNA molecule of the invention, 
to the substantial exclusion of hybridization of the 

35 oligonucleotide with single-stranded nucleic acids of 
non-complementary sequence. 

The nucleic acids, proteins and antibodies of 
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the present invention are useful as research tools and 
will facilitate the elucidation of the mechanistic 
action of the novel genetic and protein interactions 
involved in the control of cellular morphology. They 
5 should also find broad utility as diagnostic and 

therapeutic agents for the detection and treatment of 
cancer and other proliferative diseases. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 FIGURE lA-Figure ID. Alignment of nucleotide 

sequence (Sequence I.D* No. 1) and deduced amino acid 
sequence (Sequence I.D. No. 2) of HEF1, a cDNA of human 
origin encoding an exemplary signal mediator protein of 
the invention. 

15 FIGURE 2. Amino acid sequence alignment of 

the deduced amino acid sequence of HEF1 (Sequence I.D. 
No. 2) with homologous sequences of pl30cas from rat 
(Sequence I.D. No 3). Boxes represent regions of 
sequence identity between the two proteins. The closed 

20 circle marks the site of the initial methionine in the 
truncated clone of HEF1, The thick underline denotes 
the conserved SH3 domain. Tyrosines are marked with 
asterisks. 

FIGURE 3. Amino acid sequence alignment of 
25 the carboxy-terminal regions of HEFl-encoded hSMP with 

pl30cas and the mouse homolog of hSMP f mSMP encoded by 

MEF1 (Sequence I.D. No. 4). 

DETAILED DESCRIPTION OF THE INVENTION 

In accordance with the present invention, a 
30 novel gene has been isolated that encodes a protein 

involved in the signal transduction pathway that 

coordinates changes in cellular growth regulation. 

This protein is sometimes referred to herein as "signal 

mediator protein or "SMP." 
35 Using a screen to identify human genes that 

promote psuedohyphal conversion in the yeast 

Saccharomyces cerevisiae, a 900 bp partial cDNA clone 
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was obtained that causes strong pseudohyphal growth of 
S. cerevisiae on low nitrogen medium. This dimorphic 
shift from normal to "pseudohyphal" budding in yeast 
has been shown to involve the action of growth 
5 regulatory kinase cascades and cell cycle-related 

transcription factors (Gimeno & Fink, Mol. Cell Biol. 
14: 2100-2112, 1994; 'Gimeno et al . , Cell 68: 1077-1090, 
1992; Blacketer et al . , Mol. Cell Biol. 13: 5567-5581, 
1993; Liu et al. Science 262: 1741-1744, 1993). 

10 Using the 900 bp partial cDNA clone as a 

probe in a combination of screening approaches, a full- 
length clone of approximately 3.7kb was isolated. This 
clone encodes a single continuous open reading frame of 
about 834 amino acids, which constitutes the signal 

15 mediator protein of the invention. SMP is 

characterized by an amino- terminal SH3 domain and an 
adjacent domain containing multiple SH2 binding motifs. 
The protein also contains a carboxy terminal "effector" 
domain that is capable of inducing the shift to pseudo- 

20 hyphal budding in yeast. A cDNA encoding a mouse 

homolog of the carboxy- terminal "effector" region has 
also been identified (Figure 3) . Homology searches of 
the Genbank data base revealed an approximately 64% 
similarity on the amino acid level between SMP from 

25 human and the adapter protein, p!30cas, recently cloned 
from rat (as disclosed by Sakai et al., EMBO J. 13: 
3748-3756, 1994) . However, pl30cas is significantly 
larger than SMP (968 amino acids for rat pl30cas versus 
834 amino acids for human SMP) , and differs with repect 

3 0 to amino acid composition. A comparison of SMP with 
p!30cas is set forth in greater detail in Example 1. 

The aforementioned human partial cDNA clone 
that enhanced pseudohyphal formation in yeast encodes 
only the carboxy- terminal portion of SMP, comprising 

35 about 182 amino acids. The enhancement of pseudohyphal 
formation by the carboxy-terminal fragment of SMP, in 
addition to the relatively high degree of homology with 
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pl30cas over this region, indicates that it is this 
domain that acts as an effector in regulating cellular 
morphology. Thus, this domain is sometimes referred to 
herein as a "C-terminal effector domain." It should be 
5 noted that, although the carboxy- terminal fragment of 
pl30cas was also found capable of enhancing 
pseudohyphal formation, it did not do so to the same 
extent as the C-terminal domain of SMP (on a scale of 1 
to 10, the SMP C-terminal domain is a "10, " while the 

10 pl3 0cas C-terminal domain is a "6") . The SMP C- 
terminal domain was also found to be involved in 
homodimerization and in heterodimerization with pl30cas 
and, like pl30cas, associates with Abl and appears to 
be phosphorylated by Abl. 

15 Thus, SMP can be classified within a family 

of docking adapters, which includes pl30cas, capable of 
multiple associations with signalling molecules and 
transduction of such signals to coordinate changes in 
cellular growth regulation. The SMP protein comprises, 

20 from amino- to carboxy- terminus, an SH3 domain, a poly- 
proline domain several SH2 binding motifs, a serine 
rich region, and the carboxy- terminal effector domain. 

A human clone that encodes an exemplary 
signal mediator protein of the invention is sometimes 

25 referred to herein as "HEFl" (human enhancer of 

f ilamentation) to reflect the screening method by which 
it was in part identified. The nucleotide sequence of 
HEFl is set forth herein as Sequence I.D. No. 1. The 
signal mediator protein encoded by HEFl is sometimes 

30 referred to herein as hSMP. The amino acid sequence 

deduced from Sequence I.D. No. 1 is set forth herein as 
Sequence I.D. No. 2. The characteristics of human SMP 
are described in greater detail in Example 1. 

It is believed that Sequence I.D. No. 1 

35 constitutes a full-length SMP-encoding clone as it 
contains a suitable methionine for initiation of 
translation. This cDNA is approximately 3.7 kb in 
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length. Northern analysis of a human multi- tissue RNA 
blot (Clontech MINI) suggests a full-length transcript 
of approximately 3.4 kb. A second transcript of 
approximately 5.4 kb was also observed, which may 
5 represent an alternative splice or initiation site. 

Although the human SMP-encoding gene, HEF1, 
is described and exemplified herein, this invention is 
intended to encompass nucleic acid sequences and 
proteins from other species that are sufficiently 

10 similar to be used interchangeably with SMP-encoding 

nucleic acids and proteins for the research, diagnostic 
and therapeutic purposes described below. Because of 
the high degree of conservation of genes encoding 
specific signal transducers and related oncogenes, it 

15 will be appreciated by those skilled in the art that, 
even if the interspecies SMP homology is low, SMP- 
encoding nucleic acids and SMP proteins from a variety 
of mammalian species should possess a sufficient degree 
of homology with SMP so as to be interchangeably useful 

20 with SMP in such diagnostic and therapeutic 

applications. Accordingly, the present invention is 
drawn to mammalian SMP-encoding nucleic acids and SMP 
proteins, preferably to SMP of primate origin, and most 
preferably to SMP of human origin. Accordingly, when 

25 the terms "signal mediator protein" or "SMP" or "SMP- 
encoding nucleic acid" are used herein, they are 
intended to encompass mammalian SMP-encoding nucleic 
acids and SMPs falling within the confines of homology 
set forth below, of which hSMP, preferably encoded by 

30 HEF1, is an exemplary member. 

Allelic variants and natural mutants of 
Sequence I.D. No. 1 are likely to exist within the 
human genome and within the genomes of other mammalian 
species. Because such variants are expected to possess 

35 certain differences in nucleotide and amino acid 

sequence, this invention provides an isolated nucleic 
acid molecule and an isolated SMP protein having at 
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least about 50-60% (preferably 60-80%, most preferably 
over 80%) sequence homology in the coding region with 
the nucleotide sequence set forth as Sequence I.D. No. 
1 (and, preferably, specifically comprising the coding 
5 region of sequence I.D. No. 1), and the amino acid 
sequence of Sequence I.D. No. 2. Because of the 
natural sequence variation likely to exist among signal 
mediator proteins and nucleic acids encoding them, one 
skilled in the art would expect to find up to about 40- 

10 50% sequence variation, while still maintaining the 

unique properties of the SMP of the present invention. 
Such an expectation is due in part to the degeneracy of 
the genetic code, as well as to the known evolutionary 
success of conservative amino acid sequence variations, 

15 which do not appreciably alter the nature of the 

protein. Accordingly, such variants are considered 
substantially the same as one another and are included 
within the scope of the present invention. 

For purposes of this invention, the term 

20 "substantially the same" refers to nucleic acid or 

amino acid sequences having sequence variation that do 
not materially affect the nature of the protein (i.e. 
the structure and/or biological activity of the 
protein) . With particular reference to nucleic acid 

25 sequences, the term "substantially the same" is 

intended to refer to the coding region and to conserved 
sequences governing expression, and refers primarily to 
degenerate codons encoding the same amino acid, or 
alternate codons encoding conservative substitute amino 

30 acids in the encoded polypeptide. With reference to 

amino acid sequences, the term "substantially the same" 
refers generally to conservative substitutions and/or 
variations in regions of the polypeptide not involved 
in determination of structure or function. The terms 

35 "percent identity" and "percent similarity" are also 
used herein in comparisons among amino acid sequences. 
These terms are intended to be defined as they are in 
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the UWGCG sequence analysis program (Devereaux et al . , 
Nucl. Acids Res. 12: 387-397, 1984), available from the 
Unversity of Wisconsin. 

The following description sets forth the 
5 general procedures involved in practicing the present 
invention. To the extent that specific materials are 
mentioned, it is merely for purposes of illustration 
and is not intended to limit the invention. Unless 
otherwise specified, general cloning procedures, such 
10 as those set forth in Sambrook et al., Molecular 
Cloning , Cold Spring Harbor Laboratory (1989) 
(hereinafter "Sambrook et al.") are used. 

I. Preparation of SMP-Encoding Nucleic Acid Molecules, 
15 Signal Mediator Proteins and Antibodies Thereto 

A. Nucleic Acid Molecules 

Nucleic acid molecules encoding the SMPs of 
the invention may be prepared by two general methods: 

20 (1) They may be synthesized from appropriate nucleotide 
triphosphates, or (2) they may be isolated from 
biological sources. Both methods utilize protocols 
well known in the art. 

The availability of nucleotide sequence 

25 information, such as the full length cDNA having 

Sequence I.D. No. 1, enables preparation of an isolated 
nucleic acid molecule of the invention by 
oligonucleotide synthesis. Synthetic oligonucleotides 
may be prepared by the phosphoramadite method employed 

30 in the Applied Biosystems 38A DNA Synthesizer or 
similar devices. The resultant construct may be 
purified according to methods known in the art, such as 
high performance liquid chromatography (HPLC) . Long, 
double -stranded polynucleotides, such as a DNA molecule 

35 of the present invention, must be synthesized in 

stages, due to the size limitations inherent in current 
oligonucleotide synthetic methods. Thus, for example, 
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a 3.7 kb double -stranded molecule may be synthesized as 
several smaller segments of appropriate 
complementarity. Complementary segments thus produced 
may be annealed such that each segment possesses 
5 appropriate cohesive termini for attachment of an 

adjacent segment. Adjacent segments may be ligated by 
annealing cohesive termini in the presence of DNA 
ligase to construct an entire 3.7 kb double -stranded 
molecule. A synthetic DNA molecule so constructed may 

10 then be cloned and amplified in an appropriate vector. 

Nucleic acid sequences encoding SMP may be 
isolated from appropriate biological sources using 
methods known in the art. In a preferred embodiment, a 
cDNA clone is isolated from an expression library of 

15 human origin. In an alternative embodiment, human 
genomic clones encoding SMP may be isolated. 
Alternatively, cDNA or genomic clones encoding from 
other mammalian species may be obtained. 

In accordance with the present invention, 

20 nucleic acids having the appropriate level sequence 
homology with the protein coding region of Sequence 
I.D. No. 1 may be identified by using hybridization and 
washing conditions of appropriate stringency. For 
example, hybridizations may be performed, according to 

25 the method of Sambrook et al., using a hybridization 
solution comprising: 5X SSC, 5X Denhardt's reagent, 
1.0% SDS, 100 jig/ml denatured, fragmented salmon sperm 
DNA, 0.05% sodium pyrophosphate and up to 50% 
formamide. Hybridization is carried out at 37-42°C for 

30 at least six hours. Following hybridization, filters 
are washed as follows: (1) 5 minutes at room 
temperature in 2X SSC and 1% SDS; (2) 15 minutes at 
room temperature in 2X SSC and 0.1% SDS; (3) 30 
minutes-1 hour at 37°C in IX SSC and 1% SDS; (4) 2 

35 hours at 42 -65° in IX SSC and 1% SDS, changing the 
solution every 30 minutes. 

Nucleic acids of the present invention may be 
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maintained as DNA in any convenient cloning vector. In 
a preferred embodiment/ clones are maintained in 
plasmid cloning/expression vector, such as pBluescript 
(Stratagene, La Jolla, CA) , which is propagated in a 
5 suitable E. coli host cell. 

SMP-encoding nucleic acid molecules of the 
invention include cDNA, genomic DNA, RNA, and fragments 
thereof which may be single- or double -stranded. Thus, 
this invention provides oligonucleotides (sense or 

10 antisense strands of DNA or RNA) having sequences 

capable of hybridizing with at least one sequence of a 
nucleic acid molecule of the present invention, such as 
selected segments of the cDNA having Sequence I.D. No. 
1. Such oligonucleotides are useful as probes for 

15 detecting SMP genes in test samples of potentially 

malignant cells or tissues, e.g. by PCR amplification, 
or for the isolation of homologous regulators of 
morphological control. 

20 B. Proteins 

A full-length SMP of the present invention 
may be prepared in a variety of ways, according to 
known methods. The protein may be purified from 
appropriate sources, e.g., human or animal cultured 

25 cells or tissues, by immunoaf f inity purification. 

However, this is not a preferred method due to the low 
amount of protein likely to be present in a given cell 
type at any time. 

The availability of nucleic acids molecules 

30 encoding SMP enables production of the protein using in 
vitro expression methods known in the art. For 
example, a cDNA or gene may be cloned into an 
appropriate in vitro transcription vector, such a- pSP64 
or pSP65 for in vitro transcription, followed by cell- 

35 free translation in a suitable cell-free translation 

system, such as wheat germ or rabbit reticulocytes. In 
vitro transcription and translation systems are 
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commercially available, e.g., from Promega Biotech, 
Madison, Wisconsin or BRL, Rockville, Maryland. 

Alternatively, according to a preferred 
embodiment, larger quantities of SMP may be produced by 
5 expression in a suitable procaryotic or eucaryotic 
system. For example, part or all of a DNA molecule, 
such as the cDNA having Sequence I.D. No. 1, may be 
inserted into a plasmid vector adapted for expression 
in a bacterial cell, such as E. coli, or into a 

10 baculovirus vector for expression in an insect cell. 

Such vectors comprise the regulatory elements necessary 
for expression of the DNA in the bacterial host cell, 
positioned in such a manner as to permit expression of 
the DNA in the host cell . Such regulatory elements 

15 required for expression include promoter sequences, 
transcription initiation sequences and, optionally, 
enhancer sequences. 

The SMP produced by gene expression in a 
recombinant procaryotic or eucyarotic system may be 

20 purified according to methods known in the art. In a 
preferred embodiment, a commercially available 
expression/secretion system can be used, whereby the 
recombinant protein is expressed and thereafter 
secreted from the host cell, to be easily purified from 

25 the surrounding medium. If expression/secretion 

vectors are not used, an alternative approach involves 
purifying the recombinant protein by affinity 
separation, such as by immunological interaction with 
antibodies that bind specifically to the recombinant 

30 protein. Such methods are commonly used by skilled 
practitioners . 

The signal mediator proteins of the 
invention, prepared by the aforementioned methods, may 
be analyzed according to standard procedures. For 

35 example, such proteins may be subjected to amino acid 
sequence analysis, according to known methods. 

The present invention also provides 
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antibodies capable of immunospecif ically binding to 
proteins of the invention. Polyclonal antibodies 
directed toward SMP may be prepared according to 
standard methods. In a preferred embodiment, 
5 monoclonal antibodies are prepared, which react 
immunospecif ically with various epitopes of SMP. 
Monoclonal antibodies may be prepared according to 
general methods of Kohler and Milstein, following 
standard protocols. Polyclonal or monoclonal 

10 antibodies that immunospecif ically interact with SMP 
can be utilized for identifying and purifying such 
proteins. For example, antibodies may be utilized for 
affinity separation of proteins with which they 
immunospecif ically interact. Antibodies may also be 

15 used to immunoprecipitate proteins from a sample 

containing a mixture of proteins and other biological 
molecules. Other uses of anti-SMP antibodies are 
described below, 

20 II. Uses of SMP-Encoding Nucleic Acids, Signal 
Mediator Proteins and Antibodies Thereto 

Cellular signalling molecules have received a 
great deal of attention as potential prognostic 
indicators of neoplastic disease and as therapeutic 

25 agents to be used for a variety of purposes in cancer 
chemotherapy. As a signalling molecule that induces 
profound morphological changes, SMP and related 
proteins from other mammalian species promise to be 
particularly useful research tools, as well as 

30 diagnostic and therapeutic agents. 

A. SMP-Encoding Nucleic Acids 

SMP-encoding nucleic acids may be used for a 
variety of purposes in accordance with the present 
35 invention. SMP-encoding DNA, RNA, or fragments thereof 
may be used as probes to detect the presence of and/or 
expression of genes encoding SMP. Methods in which 
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SMP-encoding nucleic acids may be utilized as probes 
for such assays include, but are not limited to: (1) in 
situ hybridization; (2) Southern hybridization (3) 
northern hybridization; and (4) assorted amplification 
5 reactions such as polymerase chain reactions (PGR) . 

The SMP-encoding nucleic acids of the 
invention may also be utilized as probes to identify 
related genes either from humans or from other species. 
As is well known in the art, hybridization stringencies 

10 may be adjusted to allow hybridization of nucleic acid 
probes with complementary sequences of varying degrees 
of homology. Thus, SMP-encoding nucleic acids may be 
used to advantage to identify and characterize other 
genes of varying degrees of relation to SMP, thereby 

15 enabling further characterization the signalling 
cascade involved in the morphological control of 
different cell types. Additionally, they may be used 
to identify genes encoding proteins that interact with 
SMP (e.g., by the "interaction trap" technique), which 

20 should further accelerate elucidation of these cellular 
signalling mechanisms. 

Nucleic acid molecules, or fragments thereof, 
encoding SMP may also be utilized to control the 
expression of SMP, thereby regulating the amount of 

25 protein available to participate in oncogenic 

signalling pathways. Alterations in the physiological 
amount of "adapter protein" may act synergistically 
with chemotherapeutic agents used to treat cancer. In 
one embodiment, the nucleic acid molecules of the 

30 invention may be used to decrease expression of SMP in 
a population of malignant cells, In this embodiment, 
SMP proteins would be unable to serve as substrate 
acceptors for phosphorylation events mediated by 
oncogenes thereby effectively abrogating the activation 

35 signal. In this embodiment, antisense oligonucleotides 
are employed which are targeted to specific regions of 
SMP-encoding genes that are critical for gene 
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expression. The use of antisense oligonucleotides to 
decrease expression levels of a pre-determined gene is 
known in the art. In a preferred embodiment, such 
antisense oligonucleotides are modified in various ways 
5 to increase their stability and membrane permeability, 
so as to maximize their effective delivery to target 
cells in vitro and in vivo. Such modifications include 
the preparation of phosphorothioate or 
methylphosphonate derivatives, among many others, 

10 according to procedures known in the art. 

In another embodiment, overexpression of SMP 
is induced in a target population of cells to generate 
an excess of signal adapter molecules. This excess 
allows SMP to serve as a phosphorylation "sink" for the 

15 kinase activity of transforming oncogenes . 

Overexpression of SMP could lead to alterations in the 
cytoskeleton which could then be monitored with 
immunofluorescence or any other standard technique 
known in the art. Alternatively, overexpression of SMP 

20 by this method may facilitate the isolation and 

characterization of other components involved in the 
protein-protein complex formation that occurs via the 
SH2 homology domains during signal transduction. 

As described above, SMP-encoding nucleic 

25 acids are also used to advantage to produce large 
quantities of substantially pure SMP protein, or 
selected portions thereof. In a preferred embodiment, 
the C-terminal "effector domain" of SMP is produced by 
expression of a nucleic acid encoding the domain. The 

30 full-length protein or selected domain is thereafter 
used for various research, diagnostic and therapeutic 
purposes, as described below. 
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B. Signal Mediator Protein and Antibodies 

Purified SMP, or fragments thereof, may be 
used to produce polyclonal or monoclonal antibodies 
which also may serve as sensitive detection reagents 
5 for the presence and accumulation of SMP (or complexes 
containing SMP) in cultured cells or tissues from 
living patients (the term "patients" refers to both 
humans and animals) . Recombinant techniques enable 
expression of fusion proteins containing part or all of 

10 the SMP protein. The full length protein or fragments 
of the protein may be used to advantage to generate an 
array of monoclonal antibodies specific for various 
epitopes of the protein, thereby providing even greater 
sensitivity for detection of the protein in cells or 

15 tissue. 

Polyclonal or monoclonal antibodies 
immunologically specific for SMP may be used in a 
variety of assays designed to detect and quant it ate the 
protein, which may be useful for rendering a prognosis 

20 as to a malignant disease. Such assays include, but 
are not limited to; (1) flow cytometric analysis; (2) 
immunochemical localization in SMP in cultured cells or 
tissues; and (3) immunoblot analysis (e.g., dot blot, 
Western blot) of extracts from various cells and 

25 tissues. Additionally, as described above, anti-SMP 
can be used for purification of SMP (e.g., affinity 
column purification, immunoprecipitation) . 

Anti-SMP antibodies may also be utilized as 
therapeutic agents to block the normal functionality of 

30 SMP in a target cell population, such as a tumor. 
Thus, similar to the antisense oligonucleotides 
described above, anti-SMP antibodies may be delivered 
to a target cell population by methods known in the art 
(i.e. through various lipophilic carriers that enable 

35 delivery of the compound of interest to the target cell 
cytoplasm) where the antibodies may interact with 
intrinsic SMP to render it nonfunctional. 
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From the foregoing discussion, it can be seen 
that SMP-encoding nucleic acids and SMP proteins of the 
invention can be used to detect SMP gene expression and 
protein accumulation for purposes of assessing the 
5 genetic and protein interactions involved in the 

regulation of morphological control pathways of a cell 
or tissue sample. Aberrant morphological changes are 
often correlatable with metastatic cellular 
proliferation in various cancers, such as breast 

10 cancer. It is expected that these tools will be 

particularly useful for diagnosis and prognosis of 
human neoplastic disease. Potentially of greater 
significance, however, is the utility of SMP-encoding 
nucleic acids, proteins and antibodies as therapeutic 

15 agents to disrupt the signal transduction pathways 

mediated by activated oncogenes that result in aberrant 
morphological cellular alterations. 

Although the compositions of the invention 
have been described with respect to human diagnostics 

20 and therapeutics, it will be apparent to one skilled in 
the art that these tools will also be useful in animal 
and cultured cell experimentation with respect to 
various malignancies and/or other conditions manifested 
by alterations in cellular morphology. As diagnostic 

25 agents they can be used to monitor the effectiveness of 
potential anti-cancer agents on signal transduction 
pathways mediated by oncogenic proteins in vitro, 
and/or the development of neoplasms or malignant 
diseases in animal model systems. As therapeutics, 

30 they can be used either alone or as adjuncts to other 
chemotherapeutic drugs in animal models and veterinary 
applications to improve the effectiveness of such anti- 
cancer agents. 

The following Example is provided to describe 

35 the invention in further detail. This Example is 

intended to illustrate and not to limit the invention. 
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EXAMPLE 1 

Isolation and Characterization of a 
Nucleic Acid Molecule Encoding Human SMP 

In this Example, we describe the cloning of a 

5 cDNA molecule encoding human SMP. This cDNA is 

sometimes referred to herein as HEF1 for human enhancer 

of f ilamentation, because of its identification in the 

pseudohyphal screen. We also provide an analysis of 

the structure of the human SMP (hSMP) as predicted from 

10 the deduced amino acid sequence encoded by the cDNA. 

Additionally, we describe the antibodies immunospecif ic 

for the recombinant hSMP protein, and their use in 

immunological detection of phosphorylated SMP from 

normal and Abl transformed NIH3T3 cells. 

15 

Isolation of cDNA and cloning 

A HeLa cDNA library constructed in the 
TRPl+vector JG4-4 (Gyuris et al., Cell 75:791-803), was 
translated with inserts expressed as native proteins 

20 under the control of the galactose -inducible GAL1 
promoter, into CGx74 yeast (MATa/a trpl/trpl; see 
Gimeno et al., 1992, supra). TRP+ transf ormants were 
plated to the nitrogen-restricted SLAGR medium (like 
SLAD, but with 2% galactose, 1% raffinose as a carbon 

25 source), and 120,000 colonies were visually screened 

using a Wild dissecting microscope at 50x amplification 
to identify colonies that produced pseudohyphae more 
extensively than background. cDNAs from these colonies 
were isolated and retransf ormed to naive CGx74; those 

30 that reproducibly generated enhanced pseudohyphae were 
sequenced. A 900 bp cDNA encoding a 182 amino acid 
open reading frame corresponding to the COOH- terminus 
of hSMP (HEFl-Cterm 182) possessed the most dramatic 
phenotype of cDNA obtained in this screen. Using the 

35 original 900 bp cDNA isolated in the pseudohyphal 
screen to probe a placental cDNA library cloned in 
lambda gtll, a larger clone (3.4 kb) was isolated. The 
longer clone obtained in this screen was used as a 
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basis for 5' RACE using a kit from Clontech containing 
RACE-ready cDNA prepared from human kidney. Three 
independent clones from the RACE approach yielded 
identical 5' end-points located 18 base pairs upstream 
5 of the ATG encoding the first methionine in the 

sequence shown in Figure 1. Repeated efforts with 
multiple primer sets showed no evidence for an N- 
terminally extended sequence. The full length clone, 
HEF1, is about 3.7 kb and encodes a protein about 835 
10 amino acids in length. 

Sequence Analysis 

Both strands of the HEF1 clone were sequenced 
using oligonucleotide primers to the JG4-4 vector and 

15 to internal HEF1 sequences in combination with the 

Sequenase system (United States Biochemical) Database 
searching was performed using the BLAST algorithm 
(Altschul et al., J. Mol. Biol. 215:403-410, 1990) and 
sequence analysis was carried out using the package of 

20 programs from UWGCG (Devereux et al . , Nucl. Acids Res. 
12:387-397, 1984). 

Northern Analysis 

HEF1 cDNA was labelled with 32 P-dCTP by random 
25 priming, and used to probe a Northern blot containing 2 
/xg/lane human mRNA from multiple tissues. The blot was 
stripped and reprobed with a 32 P- labelled 
oligonucleotide specific for actin as a control for 
equivalent loading . 

30 

Immunoprecipitation and Western Blotting 

Immunoprecipitation of hSMP from normal and 
Abl transformed NIH 3T3 cells was accomplished using 
polyclonal antiserum raised against a peptide derived 
35 from the hSMP C-terminus. Immunoprecipitates were 

resolved by electrophoresis on a 12% SDS-polyacryl amide 
gel. Following electrophoresis, immunoprecipitates were 
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transferred to nitrocellulose, and reprobed with anti- 
phospho tyrosine antibody (4G10) . 



Growth Profiles 

5 Yeast were transformed with HEF1 or vector 

alone and grown to saturated overnight cultures in trp" 
glucose defined minimal medium, and re-diluted to OD600 
<0.05 in trp* galactose for growth curves. Growth 
curves were performed, with readings taken at 90 minute 
10 intervals for 12 hours, and at less frequent intervals 
up to 48 hours or longer. 

Interaction Trap or Two Hybrid Analysis 

EGY48 yeast (Gyuris et al., 1993, supra) were 

15 transformed by standard methods with plasmids 

expressing LexA- fusions, activation-domain fusions, or 
both, together with the LexA operator-LacZ reporter 
SH18-34 (Gyuris et al., 1993, supra). For all fusion 
proteins, synthesis of a fusion protein of the correct 

20 length in yeast was confirmed by Western blot assays of 
yeast extracts (Samson et al., Cell 57: 1045-1052, 
1989) using polyclonal antiserum specific for LexA 
(Brent and Ptashne, Nature 312 : 612-615, 1984) or for 
hemagglutinin (Babco, Inc) , as appropriate. Activation 

25 of the LacZ reporter was determined as previously 

described (Brent and Ptashne, Cell 43: 729-736, 1985). 
Beta-galactosidase assays were performed on three 
independent colonies, on three separate occasions, and 
values for particular plasmid combinations varied less 

30 than 25%. Activation of the LEU2 reporter was 

determined by observing the colony forming ability of 
yeast plated on complete minimal medium lacking 
leucine. The LexA-PRD/HD expressing plasmid has been 
described (Golemis and Brent, Mol . Cell Biol. 12: 3006- 

35 3014, 1992) . 
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RESULTS 

Overexp ression of the C-terminal domain nf 
SMP influences Saccharomvces cerevisiae cell 
morphology. To identify proteins that regulate the 
.5 morphology and polarity of human cells, a human cDNA 

library was screened for genes which enhanced formation 
of pseudohyphae when expressed in S. cerevisiae. The 
yeast undergoes a dimorphic shift in response to severe 
nitrogen limitation that involves changes in budding 

10 pattern, cell cycle control, cell elongation, and 

invasive growth into agar {Gimeno et al., 1992, supra). 
A galactose -inducible HeLa cell cDNA library was used 
to transform a yeast strain that can form pseudohyphae 
on nitrogen- restricted media, and a number of human 

15 genes which specifically enhanced pseudohyphal 

formation were identified. One of the cDNAs derived 
from this screen was found to cause the constitutive 
formation of pseudohyphae on rich and nitrogen 
restricted media. This cDNA is sometimes referred to 

20 as "HEF1-Cterml82" (because it encodes 182 amino acids 
of the C-terminal domain of the human SMP) . A full- 
length clone containing the cDNA sequence was 
thereafter obtained. Analysis of the sequence of this 
cDNA (Sequence I.D. No. 1; Figure 1) revealed that it 

25 was a novel human gene with strong sequence similarity 
to the rat pl30cas gene (as disclosed by Sakai et al. 
EMBO J. 13: 3748-3756, 1994) . This gene was designated 
HEF1, and its encoded protein was designated hSMP 
(Sequence I.D. No. 2). A comparison of the amino acid 

30 compositions (% by weight) of the HEFl-encoded hSMP and 
the rat pl30cas is shown in Table 1 below. 
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TABLE 1 



Amino Acid % Composition 

5 





hSMP 


Dl30cas 


Alanine 


4.3 


6.2 


Arginine 


6 .1 


7.5 


Asparagine 


4.1 


1.8 


Aspartic acid 


5.6 


6.5 


Cysteine 


1.5 


0.6 


Glutamine 


8.3 


8.1 


Glutamic acid 


6.6 


5.8 


Glycine 


3.5 


4.5 


Histidine 


4.0 


3.1 


Isoleucine 


4.2 


1.6 


Leucine 


8.7 


9.6 


Lysine 


6.2 


4.8 


Methionine 


2.8 


1.0 


Phenylalanine 


3.2 


1.6 


Proline 


7.0 


11.1 


Serine 


6.6 


6.7 


Threonine 


4.8 


4.9 


Tryptophan 


1.1 


1.1 


Tyrosine 


4.8 


4.7 


Valine 


5.6 


7.7 



The deduced length of HEF1- encoded hSMP is 
30 834 amino acids and its deduced molecular weight is 
about 107,897 Da. The deduced length of the rat 
pl30cas is 968 amino acids and its deduced molecular 
weight is about 121,421 Da. 

35 Tissue specific expression of HEF1 . RNA 

production was assessed by Northern blot analysis. 
HEF1 is expressed as two predominant transcripts of 
approximately 3.4 and 5.4 Jcb. Although present in all 
tissues examined (heart, brain, placenta, lung, liver, 

40 skeletal muscle, kidney and pancreas) , these 

transcripts are present at significantly higher levels 
in kidney, lung, and placenta. In contrast, a more 
uniform distribution throughout the body has been 
reported for p!30cas. Two other cross-hybridizing 

45 minor species were detected, migrating at 8.0 kb in 
lung and 1.2 kb in liver. These may represent 
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alternatively spliced HEF1 transcripts or other 
HEFl/pl30cas related genes. HEF1 represents a distinct 
gene from p!30cas rather than a human homolog, inasmuch 
as a screen of a murine genomic library with HEF1 cDNA 
5 led to identification of an exon that encoded a mouse 
C-terminal effector protein having a sequence 
essentially identical to hSMP-Cterml82 (Figure 3) . 
Furthermore, probe of a zoo blot at high stringency 
with a HEF1 cDNA probe indicates this gene is highly 
10 conserved from humans to yeast. 

hSMP does not induce constitutive 
pseudohv phal budding bv causing severe cell stress. 
The possibility that the C-terminal domain of hSMP was 

15 enhancing pseudohyphae formation by causing severe cell 
stress was excluded by comparing the growth rates of 
yeast containing the HEFl-cterml82 cDNA to yeast 
containing the expression vector control on plates and 
in liquid culture, with galactose as a sugar source to 

20 induce expression of HEFl-cterml82 . The growth rate 

data shows that SMP- encoding genes are not simply toxic 
to yeast. 

SMP belongs to a class of "adapter proteins" 
25 important in signalling cascades influencing 

morphol ogical control. The HEF1 gene is approximately 
3.7 kb and encodes a single continuous open reading 
frame of about 835 amino acids. The predicted hSMP 
protein notably contains an amino- terminal SH3 domain 
30 and an adjacent domain containing multiple SH2 binding 
motifs. Homology search of the Genbank database 
revealed that hSMP is 64% similar at the amino acid 
level to the adapter protein pl30cas, recently cloned 
from rat (Sakai et al., EMBO J. 13:3748-3756, 1994). 
35 The amino acid alignment of hSMP and p!30cas is shown 
in Figure 2. P130cas was determined to be the 
predominant phosphorylated species in cells following 
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transformation by the oncoprotein Crk and also 
complexes with, and is a substrate for Abl and Src. As 
shown in Table 2 below, the homology between SMP and 
pl30cas is most pronounced over the SH3 domain (92% 
5 similarity, 74% identity) and in the region 

corresponding to the SMP-Cterml82 fragment (74% 
similarity, 57% identity) . Although the domain 
containing SH2 -binding motifs is more divergent from 
pl30cas, SMP similarly possesses a large number of 

10 tyrosines in this region. The majority of SH2 binding 
sites in pl30cas match the consensus for the SH2 domain 
of the oncoprotein Crk, while the amino acids flanking 
the tyrosine residues in SMP are more diverse, 
suggesting a broader range of associating proteins. 

15 Various SH2 binding motifs conserved between hSMP and 
pl30cas are shown in Table 3. 
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TABLE 2 

Domain Alignment: hSMP and pl30cas 

(Domains from amino to carboxyl terminus down the Table) 





Domain 


Size 
hSMP 


(a.a. ) 
pl30cas 


% Similarity/Identity 
(hSMP : pl30cas) 


10 


SH3 


50 


50 


92% similar, 
74% indentical 


15 


Polyproline 


10 


38 


(not compared) 




SH2 binding 
motifs 


290 


410 


55% similar, 
36% identical 


20 


Serine-rich 
region 


250 


260 


56% similar, 
35% identical 


25 


C-terminal 
effector domain 


210 


210 


74% similar, 
57% identical 



30 TABLE 3 

Conserved SH2 Binding Motifs and Associating Proteins 



35 



SH2 Binding Motif 



Associating Proteins 



40 



YDIP 
YDVP 
YDFP 

YEYP 
YAIP 
YQNQ 



Crk 



Vav or fps/fes 

Abl 

Grb2 



45 



50 



YQVP 
YQKD 
YVYE 
YPSR 
YNCD 



Novel 
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The enhancement of pseudohyphal formation by 
hSMP-Cterml82 fragment in addition to the relatively high 
degree of homology to pl30cas suggests that this domain 
5 acts as an effector in regulating cellular morphology. A 
test was performed to assay whether the homologous region 
of p!30cas also enhanced pseudohyphal formation. The 
results show that the C-terminal fragment of pl30cas did 
enhance psuedohyphal formation but not to the same extent 

10 as the C-terminal fragment of SMP. SMP was found to 

induce the strongest pseudohyphal phenotype of only cDNA 
fragment. By comparison, pl3 0cas and another 
pseudohyphal inducer, RBP7 (subunit 7 of human RNA 
polymerase II, Golemis et al . , Mol . Biol, of the Cell, 

15 1995, in press) were only about 60% as effective as the 
hSMP-Cterml82 fragment. 

The possible functions for the novel carboxy- 
terminal domains were investigated further using two- 
hybrid analysis. These experiments revealed that this 

20 domain mediated SMP homodimerization, and SMP/pl30cas 
heterodimerization, yet failed to interact with non- 
specific control proteins. 

SMP is a substrate for oncogene mediated 
25 phosphorylation . SMP was immunoprecipitated from normal 
and v-Abl transformed NIH3T3 cells using polyclonal 
antisera raised against a MAP peptide derived from the 
hSMP C-terminal domain. Probe of these 
immunoprecipitates with antibody to phosphotyrosine 
30 revealed a species migrating at approximately 130-140 kD 
that was specifically observed in Abl-transf ormed 
fibroblasts . This species may represent SMP 
phosphorylated by Abl, as SMP possesses a good match to 
SH2 binding domain recognized by Abl. The larger 
35 apparent molecular weight as compared with hSMP deduced 
molecular weight may reflect glycosylation or may be a 
result of its phosphorylated state. 
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SMP dimerizes with other important cellular 
regulatory proteins. To assay whether SMP dimerizes with 
other cellular proteins, the interaction trap/two hybrid 
analysis system was used. Briefly, a LexA- fusion and an 
5 epitope-tagged, activation-domain fusion to SMP were 

synthesized. The expression of proteins of the predicted 
size in yeast was confirmed using antibodies specific for 
the fusion moieties. Using a LexA-operator reporter, it 
was observed that LexA-SMP fusion protein activates 

10 transcription extremely weakly. However, LexA-SMP is 
able to interact with co-expressed activation domain- 
fused SMP to activate transcription of the reporter, 
indicating that it is able to form dimers (or higher 
order multimers) . 

15 SMP joins pl30cas in defining a new family of 

docking adapters that, through multiple associations with 
signalling molecules via SH2 binding domains, is likely 
to coordinate changes in cellular growth regulation. The 
interactions between SMP homodimers and SMP-pl30cas 

20 heterodimers may negatively regulate SMP and p!30cas 

proteins by making them inaccessible to their targets. 
Alternatively, SMP and pl30cas could work together to 
recruit new proteins to the signalling complex. The fact 
that the novel C- terminal domain shared between SMP and 

25 pl30cas has the ability to cause pseudohyphal formation 
in yeast suggests that these proteins may directly alter 
cellular morphology by interacting with the cytoskeleton. 
In fact, previous yeast -morphology based screens for 
higher eucaryotic proteins have tended to isolate 

30 cytoskeletally related proteins. This invention 

therefore provides reagents influencing the changes in 
cell morphology that accompany oncoprotein-mediated 
transformation in carcinogenesis. 

The present invention is not limited to the 

35 embodiments specifically described above, but is capable 
of variation and modification without departure from the 
scope of the appended claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: Golemis, Erica A. 

Law, Susan F. 
Estojak, JoAnne 

10 (ii) TITLE OF INVENTION: NUCLEIC ACID MOLECULE ENCODING A 

SIGNAL MEDIATOR PROTEIN THAT INDUCES CELLULAR 
MORPHOLOGICAL ALTERATIONS 



(iii) NUMBER OF SEQUENCES: 4 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Dann, Dorfman, Herrell and Skillman 

(B) STREET: 1601 Market Street Suite 720 

(C) CITY: Philadelphia 
20 (D) STATE: PA 

(E) COUNTRY: USA 

(F) ZIP: 19103-2307 

(v) COMPUTER READABLE FORM: 
25 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D> SOFTWARE: Patentln Release #1.0, Version #1.30 

30 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 30-June-1995 

(C) CLASSIFICATION: 

35 (viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Reed, Janet E. 

(B) REGISTRATION NUMBER: 36,252 

(ix) TELECOMMUNICATION INFORMATION: 
40 (A) TELEPHONE: (215) 563-4100 

(B) TELEFAX: (215) 563-4044 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3672 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 
50 (D) TOPOLOGY: not relevant 



(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ACCCCCACGC TACCGAAATG AAGTATAAGA ATCTTATGGC AAGGGCCTTA TATGACAATG 60 

TCCCAGAGTG TGCCGAGGAA CTGGCCTTTC GCAAGGGAGA CATCCTGACC GTCATAGAGC 120 

65 AGAACACAGG GGGACTGGAA GGATGGTGGC TGTGCTCGTT ACACGGTCGG CAAGGCATTG 180 

TCCCAGGCAA CCGGGTGAAG CTTCTGATTG GCCCCATGCA GGAGACTGCC TCCAGTCACG 240 
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AGCAGCCTGC CTCTGGACTG ATGCAGCAGA 
CAAACCCACA GGCTGCTCCC CGAGACACTA 
5 AGGGAATTTA CCAAGTCCCC ACTGGCCACG 
CACCATCAGT GCAGAGAAGC ATTGGGGGAA 
TAACCCCCGT GAGGACAGGC CATGGCTACG 

10 

ATGTCTATGA TATCCCTCCT TCTCATACCA 
CAGCAAAAGG CCCTGTGTTT TCAGTTCCAG 
15 ACATCCCGCC TACAAAAGGG GTATATGCCA 
GGCTTAGGGA AAAAGACTAT GACTTCCCCC 
TCAGACCGGA GGGGGTTTAT GACATTCCTC 

20 

TTCATGTAAA ATACAACTGT GACATTCCAG 
AGAGCCTGTC CCCGAATCAC CCACCCCCGC 
25 ACGCATATGA TGTCCCCCGA GGCGTTCAGT 
AAGCAAACCC CCAGGAAAGG GATGGTGTTT 
CTAAAGGCTC TCGGGACTTG GTGGATGGGA 

30 

GCACCCGGAG TAACATGTCC ACGTCTTCCA 
CCCCAGCTCA GGACAAAAGG CTCTTCCTGG 
35 GGCTCCAGCA GGCCCTTGAG ATGGGTGTCT 
GGCGGTGTTA CGGATATATG GAAAGACACA 
TGGAGCTGTT CCTGAAGGAG TACCTCCACT 

40 

GCCTCCCGGA ACTCATCCTC CACAACAAGA 
CCCACCAGAT CCTGAGTCAA ACCAGCCATG 
45 TCTTGGCCAT CAACAAGCCC CAGAACAAGT 
CAAAGACGGT GCCCGATGAC GCCAAGCAGC 
CCCTCTTCAG ACCCGGCCCT GGCAGCTTGC 

50 

ACTCAACGGA GTACCCACAC GGTGGCTCCC 
AGGCCCAGGC CCACAACAAG GCACTGCCCC 
55 GTAGCAGCAG TGATGGTTCT GAGAGGAGCT 
AGGGTAAGGA GGAGTTTGAG AGGCAACAGA 
AACAGAACAA GATGCAGCTG GAACATCATC 

6.0 

AGATTACAAA GCCCGTGGAG AATGACATCT 
CCACAAACAG TGGCGTGAGT GCTCAGGATC 
65 GTGAGACCCA TTTCATTTCC CTTCTCAACG 
CAGCCCAGCC CCCGCGAATC TTCGTGGCAC 
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CCTTTGGCCA 


ACAGAAGCTC 


TATCAAGTGC 


300 


TCTACCAAGT 


GCCACCTTCC 


TACCAAAATC 


360 


GCACCCAAGA ACAAGAGGTA 


TATCAGGTGC 


420 


CCAGTGGGCC 


CCACGTGGGT 


AAAAAGGTGA 


480 


TATACGAGTA 


CCCATCCAGA 


TACCAAAAGG 


540 


CTCAAGGGGT 


ATACGACATC 


CCTCCCTCAT 


600 


TGGGAGAGAT 


AAAACCTCAA 


GGGGTGTATG 


660 


TTCCGCCCTC 


TGCTTGCCGG 


GATGAAGCAG 


720 


CTCCCATGAG 


ACAAGCTGGA AGGCCGGACC 


780 


CAACCTGCAC 


CAAGCCAGCA 


GGGAAGGACC 


840 


GAGCTGCAGA 


ACCGGTGGCT 


CGAAGGCACC 


900 


AACTCGGACA 


GTCAGTGGGC 


TCTCAGAACG 


960 


TTCTTGAGCC 


ACCAGCAGAA 


ACCAGTGAGA 


1020 


ATGATGTCCC 


TCTGCATAAC 


CCGCCAGATG 


1080 


TCAACCGATT 


GTCTTTCTCC 


AGTACAGGCA 


1140 


CCTCCTCCAA 


GGAGTCCTCA 


CTGTCAGCCT 


1200 


ATCCAGACAC 


AGCTATTGAG 


AGACTTCAGC 


1260 


CCAGCCTAAT 


GGCACTGGTC 


ACTACCGACT 


1320 


TCAATGAAAT 


ACGCACAGCA 


GTGGACAAGG 


1380 


TTGTCAAGGG 


AGCTGTTGCA 


AATGCTGCCT 


1440 


TGAAGCGGGA 


GCTGCAACGA 


GTCGAAGACT 


1500 


ACTTAAATGA 


GTGCAGCTGG 


TCCCTGAATA 


1560 


GTGACGATCT 


GGACCGGTTT 


GTGATGGTGG 


1620 


TCACCACAAC 


CATCAACACC AACGCAGAGG 


1680 


ATCTGAAGAA 


TGGGCCGGAG 


AGCATCATGA 


1740 


AGGGACAGCT 


GCTGCATCCT 


GGTGACCACA 


1800 


CAGGCCTGAG 


CAAGGAGCAG 


GCCCCTGACT 


1860 


GGATGGATGA 


CTACGATTAC 


GTCCACCTAC 


1920 


AAGAGCTATT 


GGAAAAAGAG 


AATATCATGA 


1980 


AGCTGAGCCA 


GTTCCAGCTG 


TTGGAACAAG 


2040 


CGAAGTGGAA 


GCCCTCTCAG 


AGCCTACCCA 


2100 


GGCAGTTGCT 


GTGCTTCTAC 


TATGACCAAT 


2160 


CCATTGACGC 


ACTCTTCAGT 


TGTGTCAGCT 


2220 


ACAGCAAGTT 


TGTCATCCTC 


AGTGCACACA 


2280 
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AACTGGTGTT CATTGGAGAC ACGCTGACAC GGCAGGTGAC TGCCCAGGAC ATTCGCAACA 234 0 

AAGTCATGAA CTCCAGCAAC CAGCTCTGCG AGCAGCTCAA GACTATAGTC ATGGCAACCA 2400 

AGATGGCCGC CCTCCATTAC CCCAGCACCA CGGCCCTGCA GGAAATGGTG CACCAAGTGA 2460 

CAGACCTTTC TAGAAATGCC CAGCTGTTCA AGCGCTCTTT GCTGGAGATG GCAACGTTCT 2520 

GAGAAGAAAA AAAAGAGGAA GGGGACTGCG TTAACGGTTA CTAAGGAAAA CTGGAAATAC 2580 

TGTCTGGTTT TTGTAAATGT TATCTATTTT TGTAGATAAT TTTATATAAA AATGAAATAT 2640 

TTTAACATTT TATGGGTCAG ACAACTTTCA GAAATTCAGG GAGCTGGAGA GGGAAATCTT 2700 

15 TTTTTCCCCC CTGAGTNGTT CTTATGTATA CACAGAAGTA TCTGAGACAT AAACTGTACA 2760 

GAAAACTTGT CCACGTCCTT TTGTATGCCC ATGTATTCAT GTTTTTGTTT GTAGATGTTT 2820 

GTCTGATGCA TTTCATTAAA AAAAAAACCA TGAATTACGA AGCACCTTAG TAAGCACCTT 2880 

20 

CTAATGCTGC ATTTTTTTTG TTGTTGTTAA AAACATCCAG CTGGTTATAA TATTGTTCTC 2940 

CACGTCCTTG TGATGATTCT GAGCCTGGCA CTGGGAATCT GGGAAGCATA GTTTATTTGC 3000 

25 AAGTGTTCAC CTTCCAAATC ATGAGGCATA GCATGACTTA TTCTTGTTTT GAAAACTCTT 3060 

TTCAAAACTG ACCATCTTAA ACACATGATG GCCAAGTGCC ACAAAGCCCT CTTGCGGAGA 3120 

CATTTACGAA TATATATGTG GATCCAAGTC TCGATAGTTA GGCGTTGGAG GGAAGAGAGA 3180 

30 

CCAGAGAGTT TAGAGGCCAG GACCACAGTT AGGATTGGGT TGTTTCAATA CTGAGAGACA 3240 

GCTACAATAA AAGGAGAGCA ATTGCCTCCC TGGGG CTGTT CAATCTTCTG CATTTGTGAG 3300 

35 TGGTTCAGTC ATGAGGTTTT CCAAAAGATG TTTTTAGAGT TGTAAAAACC ATATTTGCAG 3360 

CAAAGATTTA CAAAGGCGTA TCAGACTATG ATTGTTCACC AAAATAGGGG AATGGTTTGA 3420 

TCCGCCAGTT GCAAGTAGAG GCCTTTCTGA CTCTTAATAT TCACTTTGGT GCTACTACCC 3480 

40 

CCATTACCTG AGGAACTGGC CAGGTCCTTG ATCATGGAAC TATAGAGCTA CCAGACATAT 3540 

CCTGCTCTCT AAGGGAATTT ATTGCTATCT TGCACCTTCT TTAAAACTCA AAAAACATAT 3600 

45 GCAGACCTGA CACTCAAGAG TGGCTAGCTA CACAGAGTCC ATCTAATTTT TGCAACTTCC 3660 

CCCCCCGAAT TC 3672 



50 



60 



65 



(2) INFORMATION FOR SEQ ID NO:2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 834 amino acids 
55 (B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 



(ii) MOLECULE TYPE: protein 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



WO 97/02362 



PCT/US96/10823 



- 36 - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Lys Tyr Lys Asn Leu Met Ala Arg Ala Leu Tyr Asp Asn Val Pro 
15 10 15 

Glu Cys Ala Glu Glu Leu Ala Phe Arg Lys Gly Asp He Leu Thr Val 
20 25 30 

He Glu Gin Asn Thr Gly Gly Leu Glu Gly Trp Trp Leu Cys Ser Leu 
10 35 40 45 

His Gly Arg Gin Gly He Val Pro Gly Asn Arg Val Lys Leu Leu He 
50 55 60 

15 Gly Pro Met Gin Glu Thr Ala Ser Ser His Glu Gin Pro Ala Ser Gly 

65 70 75 80 



20 



35 



50 



65 



Leu Met Gin Gin Thr Phe Gly Gin Gin Lys Leu Tyr Gin Val Pro Asn 
85 90 95 

Pro Gin Ala Ala Pro Arg Asp Thr He Tyr Gin Val Pro Pro Ser Tyr 
100 105 no 



Gin Asn Gin Gly He Tyr Gin Val Pro Thr Gly His Gly Thr Gin Glu 
25 lis 120 125 

Gin Glu Val Tyr Gin Val Pro Pro Ser Val Gin Arg Ser He Gly Gly 

130 135 140 

30 Thr Ser Gly Pro His Val Gly Lys Lys Val He Thr Pro Val Arg Thr 

145 150 155 160 



Gly His Gly Tyr Val Tyr Glu Tyr Pro Ser Arg Tyr Gin Lys Asp Val 
165 170 175 

Tyr Asp He Pro Pro Ser His Thr Thr Gin Gly Val Tyr Asp He Pro 
180 185 190 



Pro Ser Ser Ala Lys Gly Pro Val Phe Ser Val Pro Val Gly Glu He 
40 195 200 205 

Lys Pro Gin Gly Val Tyr Asp He Pro Pro Thr Lys Gly Val Tyr Ala 
210 215 220 

45 He Pro Pro Ser Ala Cys Arg Asp Glu Ala Gly Leu Arg Glu Lys Asp 

225 230 235 240 



Tyr Asp Phe Pro Pro Pro Met Arg Gin Ala Gly Arg Pro Asp Leu Arg 
245 250 255 

Pro Glu Gly Val Tyr Asp He Pro Pro Thr Cys Thr Lys Pro Ala Gly 
260 265 270 



Lys Asp Leu His Val Lys Tyr Asn Cys Asp He Pro Gly Ala Ala Glu 
55 275 280 285 

Pro Val Ala Arg Arg His Gin Ser Leu Ser Pro Asn His Pro Pro Pro 
290 295 300 

60 Gin Leu Gly Gin Ser Val Gly Ser Gin Asn Asp Ala Tyr Asp Val Pro 

305 310 315 320 



Arg Gly Val Gin Phe Leu Glu Pro Pro Ala Glu Thr Ser Glu Lys Ala 
325 330 335 

Asn Pro Gin Glu Arg Asp Gly Val Tyr Asp Val Pro Leu His Asn Pro 
340 345 350 
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Pro Asp Ala Lys Gly Ser Arg Asp Leu Val Asp Gly He Asn Arg Leu 
355 360 365 

Ser Phe Ser Ser Thr Gly Ser Thr Arg Ser Asn Met Ser Thr Ser Ser 
370 375 380 

Thr Ser Ser Lys Glu Ser Ser Leu Ser Ala Ser Pro Ala Gin Asp Lys 
385 390 395 400 



10 



Arg Leu Phe Leu Asp Pro Asp Thr Ala lie Glu Arg Leu Gin Arg Leu 
405 410 415 



15 



Gin Gin Ala Leu Glu Met Gly Val Ser Ser Leu Met Ala Leu Val Thr 
420 425 430 

Thr Asp Trp Arg Cys Tyr Gly Tyr Met Glu Arg His He Asn Glu He 
435 440 445 



20 



Arg Thr Ala Val Asp Lys Val Glu Leu Phe Leu Lys Glu Tyr Leu His 
450 455 460 



Phe Val Lys Gly Ala Val Ala Asn Ala Ala Cys Leu Pro Glu Leu He 
465 470 475 480 



25 



Leu His Asn Lys Met Lys Arg Glu Leu Gin Arg Val Glu Asp Ser His 
485 490 495 



30 



Gin He Leu Ser Gin Thr Ser His Asp Leu Asn Glu Cys Ser Trp Ser 
500 505 510 

Leu Asn He Leu Ala He Asn Lys Pro Gin Asn Lys Cys Asp Asp Leu 
515 520 525 



35 



Asp Arg Phe Val Met Val Ala Lys Thr Val Pro Asp Asp Ala Lys Gin 
530 535 540 



Leu Thr Thr Thr He Asn Thr Asn Ala Glu Ala Leu Phe Arg Pro Gly 
545 550 555 560 



40 



Pro Gly Ser Leu His Leu Lys Asn Gly Pro Glu Ser He Met Asn Ser 
565 570 575 



45 



Thr Glu Tyr Pro His Gly Gly Ser Gin Gly Gin Leu Leu His Pro Gly 
580 585 590 

Asp His Lys Ala Gin Ala His Asn Lys Ala Leu Pro Pro Gly Leu Ser 
595 600 605 



50 



Lys Glu Gin Ala Pro Asp Cys Ser Ser Ser Asp Gly Ser Glu Arg Ser 
610 615 620 



Trp Met Asp Asp Tyr Asp Tyr Val His Leu Gin Gly Lys Glu Glu Phe 
625 630 635 640 



55 



Glu Arg Gin Gin Lys Glu Leu Leu Glu Lys Glu Asn He Met Lys Gin 
645 650 655 



60 



Asn Lys Met Gin Leu Glu His His Gin Leu Ser Gin Phe Gin Leu Leu 
660 665 670 

Glu Gin Glu He Thr Lys Pro Val Glu Asn Asp He Ser Lys Trp Lys 
675 680 685 



65 



Pro Ser Gin Ser Leu Pro Thr Thr Asn Ser Gly Val Ser Ala Gin Asp 
690 695 700 
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20 



25 



Arg Gin Leu Leu Cys Phe Tyr Tyr Asp Gin Cys Glu Thr His Phe He 
705 710 715 720 

Ser Leu Leu Asn Ala He Asp Ala Leu Phe Ser Cys Val Ser Ser Ala 
725 730 735 

Gin Pro Pro Arg He Phe Val Ala His Ser Lys Phe Val He Leu Ser 
740 745 750 

Ala His Lys Leu Val Phe He Gly Asp Thr Leu Thr Arg Gin Val Thr 
755 760 765 

Ala Gin Asp He Arg Asn Lys Val Met Asn Ser Ser Asn Gin Leu Cys 
770 775 7B0 

Glu Gin Leu Lys Thr He Val Met Ala Thr Lys Met Ala Ala Leu His 
785 790 795 800 

Tyr Pro Ser Thr Thr Ala Leu Gin Glu Met Val His Gin Val Thr Asp 
805 810 815 

Leu Ser Arg Asn Ala Gin Leu Phe Lys Arg Ser Leu Leu Glu Met Ala 
820 825 830 

Thr Phe 



30 



(2) INFORMATION FOR SEQ ID NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 872 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 
35 (D) TOPOLOGY: not relevant 



40 



45 



60 



(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



Met Lys Tyr Leu Asn Val Leu Ala Lys Ala Leu Tyr Asp Asn Val Ala 
15 10 15 

Glu Ser Pro Asp Glu Leu Ser Phe Arg Lys Gly Asp He Met Thr Val 
50 20 25 30 

Glu Arg Asp Thr Gin Gly Leu Asp Gly Trp Trp Leu Cys Ser Leu His 
35 40 45 

55 Gly Arg Gin Gly He Val Pro Gly Asn Arg Leu Lys He Leu Val Gly 

50 55 60 



Met Tyr Asp Lys Lys Pro Ala Ala Pro Gly Pro Gly Pro Pro Ala Thr 
65 70 75 80 

Pro Pro Gin Pro Gin Pro Ser Leu Pro Gin Gly Val His Thr Pro Val 
85 90 95 



Pro Pro Ala Ser Gin Tyr Ser Pro Met Leu Pro Thr Ala Tyr Gin Pro 
65 100 105 no 
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Gin Pro Asp Asn Val Tyr Leu Val Pro Thr Pro Ser Lys Thr Gin Gin 
115 120 125 

Gly Leu Tyr Gin Ala Pro Gly Asn Pro Gin Phe Gin Ser Pro Pro Ala 
130 135 140 

Lys Gin Thr Ser Thr Phe Ser Lys Gin Thr Pro His His Ser Phe Pro 
145 150 155 160 

Ser Pro Ala Thr Asp Leu Tyr Gin Val Pro Pro Gly Pro Gly Ser Pro 
165 170 175 

Ala Gin Asp He Tyr Gin Val Pro Pro Ser Ala Gly Thr Gly His Asp 
180 185 190 

He Tyr Gin Val Pro Pro Ser Leu Asp Thr Arg Ser Trp Glu Gly Thr 
195 200 205 

Lys Pro Pro Ala Lys Val Val Val Pro Thr Arg Val Gly Gin Gly Tyr 
210 215 220 



25 



30 



Val Tyr Glu Ala Ser Gin Ala Glu Gin Asp Glu Tyr Asp Thr Pro Arg 
225 230 235 240 

His Leu Leu Ala Pro Gly Ser Gin Asp He Tyr Asp Val Pro Pro Val 
245 250 255 

Arg Gly Leu Leu Pro Asn Gin Tyr Gly Gin Glu Val Tyr Asp Thr Pro 
260 265 270 

Pro Met Ala Val Lys Gly Pro Asn Gly Arg Asp Pro Leu Leu Asp Val 
275 280 285 



35 



Tyr Asp Val Pro Pro Ser Val Glu Lys Gly Leu Pro Pro Ser Asn His 
290 295 300 



40 



His Ser Val Tyr Asp Val Pro Pro Ser Val Ser Lys Asp Val Pro Asp 

305 310 315 320 

Gly Pro Leu Leu Arg Glu Glu Thr Tyr Asp Val Pro Pro Ala Phe Ala 

325 330 335 



45 



Lys Pro Lys Pro Phe Asp Pro Thr Arg His Pro Leu He Leu Ala Ala 
340 345 350 

Pro Pro Pro Asp Ser Pro Pro Ala Glu Asp Val Tyr Asp Val Pro Pro 
355 360 365 



50 



Pro Ala Pro Asp Leu Tyr Asp Val Pro Pro Gly Leu Arg Arg Pro Gly 
370 375 380 



55 



Pro Gly Thr Leu Tyr Asp Val Pro Arg Glu Arg Val Leu Pro Pro Glu 
385 390 395 400 

Val Ala Asp Gly Ser Val He Asp Asp Gly Val Tyr Ala Val Pro Pro 

405 410 415 



60 



Pro Ala Glu Arg Glu Ala Pro Thr Asp Gly Lys Arg Leu Ser Ala Ser 
420 425 430 

Ser Thr Gly Ser Thr Arg Ser Ser Gin Ser Ala Ser Ser Leu Glu Val 
435 440 445 



65 



Val Val Pro Gly Arg Glu Pro Leu Glu Leu Glu Val Ala Val Glu Thr 
450 455 460 
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Leu Ala Arg Leu Gin Gin Gly Val Ser Thr Thr Val Ala His Leu Leu 
465 470 475 480 

Asp Leu Val Gly Ser Ala Ser Gly Pro Gly Gly Trp Arg Ser Thr Ser 
485 490 495 

Glu Pro Gin Glu Pro Pro Val Gin Asp Leu Lys Ala Ala Val Ala Ala 
500 505 510 

Val His Gly Ala Val His Glu Leu Leu Glu Phe Ala Arg Ser Ala Val 
515 520 525 



15 



Ser Ser Ala Thr His Thr Ser Asp Arg Thr Leu His Ala Lys Leu Ser 
530 535 540 

Arg Gin Leu Gin Lys Met Glu Asp Val Tyr Gin Thr Leu Val Val His 

545 550 555 560 



20 



Gly Gin Val Leu Asp Ser Gly Arg Gly Gly Pro Gly Phe Thr Leu Asp 
565 570 575 



25 



Asp Leu Asp Thr Leu Val Ala Cys Ser Arg Ala Val Pro Glu Asp Ala 
580 585 590 

Lys Gin Leu Ala Ser Phe Leu His Gly Asn Ala Ser Leu Leu Phe Arg 
595 600 605 



30 



Arg Thr Lys Ala Pro Gly Pro Gly Pro Glu Gly Ser Ser Ser Leu His 
610 615 620 

Leu Asn Pro Thr Asp Lys Ala Ser Ser lie Gin Ser Arg Pro Leu Pro 
625 630 635 640 



35 



Ser Pro Pro Lys Phe Thr Ser Gin Asp Ser Pro Asp Gly Gin Tyr Glu 
645 650 655 



40 



Asn Ser Glu Gly Gly Trp Met Glu Asp Tyr Asp Tyr Val His Leu Gin 
660 665 670 

Gly Lys Glu Glu Phe Glu Lys Thr Gin Lys Glu Leu Leu Glu Lys Gly 
675 680 685 



45 



Asn lie Val Arg Gin Gly Lys Gly Gin Leu Glu Leu Gin Gin Leu Lys 
690 695 700 

Gin Phe Glu Arg Leu Glu Gin Glu Val Ser Arg Pro lie Asp His Asp 
705 710 715 720 



50 



Leu Ala Asn Trp Thr Pro Ala Gin Pro Leu Val Pro Gly Arg Thr Gly 
725 730 735 



Gly Leu Gly Pro Ser Asp Arg Gin Leu Leu Leu Phe Tyr Leu Glu Gin 
740 745 750 



55 



Cys Glu Ala Asn Leu Thr Thr Leu Thr Asp Ala Val Asp Ala Phe Phe 
755 760 765 



60 



Thr Ala Val Ala Thr Asn Gin Pro Pro Lys lie Phe Val Ala His Ser 
770 775 780 

Lys Phe Val lie Leu Ser Ala His Lys Leu Val Phe lie Gly Asp Thr 
785 790 795 800 



65 



Leu Ser Arg Gin Ala Lys Ala Ala Asp Val Arg Ser Lys Val Thr His 
805 810 815 
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Tyr Ser Asn Leu Leu Cys Asp Leu Leu Arg Gly lie Val Ala Thr Thr 
820 825 830 

Lys Ala Ala Ala Leu Gin Tyr Pro Ser Pro Ser Ala Ala Gin Asp Met 
5 835 840 845 

Val Asp Arg Val Lys Glu Leu Gly His Ser Thr Gin Gin Phe Arg Arg 
850 855 860 

10 Val Leu Gly Gin Leu Ala Ala Ala 

865 870 



15 



25 



40 



(2) INFORMATION FOR SEQ ID NO:4: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 78 amino acids 
20 (B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY : not relevant 



(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: NO 

30 (v) FRAGMENT TYPE: C- terminal 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

35 Leu Ser Gin Phe Gin Leu Leu Glu Gin Glu lie Thr Lys Pro Val Glu 

15 10 15 



Asn Asp He Ser Lys Trp Lys Pro Ser Gin Ser Leu Pro Thr Thr Asn 
20 25 30 

Asn Ser Val Gly Ala Gin Asp Arg Gin Leu Leu Cys Phe Tyr Tyr Asp 
35 40 45 



Gin Cys Glu Thr His Phe He Ser Leu Leu Asn Ala He Asp Ala Leu 
45 50 55 60 

Phe Ser Cys Val Ser Ser Ala Gin Pro Pro Arg He Phe Val 
65 70 75 
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WHAT IS CLAIMED IS: 

1. An isolated nucleic acid molecule that 
includes an open reading frame encoding a mammalian 

5 signal mediator protein between about 795 and about 875 
amino acids in length, said protein comprising an amino- 
terminal SH3 domain, an internal domain that includes a 
multiplicity of SH2 binding motifs, and a carboxy- 
terminal effector domain, said effector domain, when 
10 produced in Saccharomyces cerevisiae, being capable of 
inducing pseudohyphal budding in said Saccharomyces 
cerevisiae under pre-determined culture conditions. 

2. The nucleic acid molecule of claim 1, which 

15 is DNA. 

3. The DNA molecule of claim 2, which is a 
cDNA comprising a sequence approximately 3.7 kilobase 
pairs in length that encodes said signal mediator 

20 protein. 

4. The DNA molecule of claim 2, which is a 
gene, the exons of which comprise said open reading frame 
encoding said signal mediator protein. 

25 

5. The nucleic acid molecule of claim 1, which 

is RNA. 

6. An oligonucleotide between about 10 and 
30 about 100 nucleotides in length, which specifically 

hybridizes with a portion of the nucleic acid molecule of 
claim 1. 

7. The oligonucleotide of claim 6, wherein 
35 said portion includes a translation initiation site of 

said signal mediator protein. 
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8. The nucleic acid molecule of claim 1, 
wherein said open reading frame encodes a human signal 
mediator protein. 



5 9. The nucleic acid molecule of claim 8, 

wherein said open reading frame encodes a human signal 
mediator protein having an amino acid sequence 
substantially the same as Sequence I.D. No. 2. 

10 10. The nucleic acid molecule of claim 9, 

wherein said open reading frame encodes amino acid 
Sequence I.D. No. 2. 



11. The nucleic acid molecule of claim 10, 
15 which comprises Sequence I.D. No. 1. 

12. An isolated protein, which is a product of 
expression of part or all of the open reading frame of 
claim 1. 

20 

13 . An isolated nucleic acid molecule having a 
sequence selected from the group consisting of: 

a) Sequence I.D. No. 1; 

b) a sequence hybridizing with part 
25 or all of the complementary strand of Sequence I.D. No. 1 

and encoding a polypeptide substantially the same as part 
or all of a polypeptide encoded by Sequence I.D. No. 1; 
and 

c) a sequence encoding part or all 
30 of a polypeptide having amino acid Sequence I.D. No. 2. 



14 . An isolated nucleic acid molecule having a 
sequence that encodes a carboxy- terminal effector domain 
of a mammalian signal mediator protein, said domain 
35 having an amino acid sequence greater than 74% similar to 
a portion of Sequence I.D. No. 2 comprising amino acids 
626-834. 
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15. The nucleic acid molecule of claim 14, 
wherein the amino acid sequence of said carboxy- terminal 
effector domain is greater than about 57% identical to a 
portion of Sequence I.D. No. 2 comprising amino acids 

5 626-834. 

16. The nucleic acid molecule of claim 14, 
having a sequence that encodes an amino acid sequence 
greater than 65% similar to Sequence I.D. No. 2. 

10 

17. An isolated mammalian signal mediator 
protein having a deduced molecular weight of between 
about 100 kDa and about 115 kDa; said protein comprising 
an amino -terminal SH3 domain, an internal domain that 

15 includes a multiplicity of SH2 binding motifs, and a 

carboxy- terminal effector domain, said effector domain, 
when produced in Saccharomyces cerevisiae, being capable 
of inducing pseudohyphal budding in said Saccharomyces 
cerevisiae under pre-determined culture conditions. 
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18. The protein of claim 17, of human origin. 



19. The protein of claim 18, having an amino 
acid sequence substantially the same as Sequence I.D. No. 

25 2. 

20. The protein of claim 19, having amino acid 
Sequence I.D. No. 2. 

30 21. An antibody immunologically specific for 

part or all of the protein of claim 17. 

22. A polypeptide produced by expression of an 
isolated nucleic acid sequence selected from the group 
35 consisting of: 

a) Sequence I.D. No. 1; 

b) a sequence hybridizing with part 
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or all of the complementary strand of Sequence I.D. No. 1 
and encoding a polypeptide substantially the same as part 
or all of a polypeptide encoded by Sequence I.D. No. 1; 
and 

5 c) a sequence encoding part or all 

of a polypeptide having Sequence I.D. No. 2. 

23 . An antibody immunologically specific for 
part or all of the polypeptide of claim 22. 

24 . An isolated mammalian signal mediator 
protein, which comprises a carboxy- terminal effector 
domain having an amino acid sequence greater than 74% 
similar to a portion of Sequence I.D. No. 2 comprising 
amino acids 626-834. 

25. The protein of claim 24, wherein the amino 
acid sequence of said carboxy- terminal effector domain is 
greater than about 57% identical to a portion of Sequence 

20 I.D. No. 2 comprising amino acids 626-834. 

26. The protein of claim 24, having an amino 
acid sequence greater than 65% similar to Sequence I.D. 
No. 2. 

25 

27. An antibody immunologically specific for 
part or all of the protein of claim 24. 



10 
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a ccc c c acgc t a c cgaaATGAAGTATAAGAATCTTATGGC AAGGGCCTTATATGACAAT 
MKYKNLMA RALYDN 
GTCCCAGAGTGTGCCGAGGAACTGGCCTTTCGCAAGGGAGACATCCTGACCGTCATAGAG 
VPECAEELAFRKGDILTVIE 
CAGAACACAGGGGGACTGGAAGGATGGTGGCTGTGCTCGTTACACGGTCGGGAAGGCATT 
QNTGGLEGWWLCSLHGRQGI 
GTCCCAGGCAACCGGGTGAAGCTTCTGATTGGCCCCATGCAGGAGACTGCCTCCAGTCAC 
VPGNRVKLLIGPMQETASSH 
GAGCAGCCTGCCTCTGGACTGATGCAGCAGACCTTTGGCCAACAGAAGCTCTATCAAGTG 
EQPASGLMQQTFGQQKLYQV 
CCAAACCCACAGGCTGCTCCCCGAGACACTATCTACCAAGTGCCACCTTCCTACCAAAAT 
PNPQAAPRDTIYQVPPSYQN 
CAGGGAATTTACCAAGTCCCCACTGGCCACGGCACCCAAGAACAAGAGGTATATCAGGTG 
QGIYQVPTGHGTQEQEVYQV 
CCACCATCAGTGCAGAGAAGCATTGGGGGAACCAGTGGGCCCCACGTGGGTAAAAAGGTG 
PPSVQRSIGGTSGPHVGKKV 
. ATAACCCCCGTGAGGACAGGCCATGGCTACGTATACGAGTACCCATCCAGATACCAAAAG 
ITPVRTGHGYVYEYPSRYQK 
GATGTCTATGATATCCCTCCTTCTCATACCACTCAAGGGGTATACGACATCCCTCCCTCA 
DVYDIPPSHTTQGVYDIPPS 
TCAGCAAAAGGCCCTGTGTTTTCAGTTCCAGTGGGAGAGATAAAACCTCAAGGGGTGTAT 
SAKGPVFSVPVGEIKPQGVY 
GACATCCCGCCTACAAAAGGGGTATATGCCATTCCGCCCTCTGCTTGCCGGGATGAAGCA 
DIPPTKGVYAIPPSACRDEA 
GGGCTTAGGGAAAAAGACTATGACTTCCCCCCTCCCATGAGACAAGCTGGAAGGCCGGAC 
GLREKDYDFPPPMRQAGRPD 
CTCAGACCGGAGGGGGTTTATGACATTCCTCCAACCTGCACCAAGCCAGCAGGGAAGGAC 
LRPEGVYD I PPTCTKPAGKD 
CTTCATGTAAAATACAACTGTGACATTCCAGGAGCTGCAGAACCGGTGGCTCGAAGGCAC 



Figure 1A 
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LHVKYNCDIPGAAEP-VARRH 

CAGAGCCTGTCCCCGAATCACCCACCCCCGCAACTCGGACAGTCAGTGGGCTCTCAGAAC 

QSLSPNHPPPQLGQSVGSQN 

GACGCATATGATGTCCCCCGAGGCGTTCAGTTTCTTGAGCCACCAGCAGAAACCAGTGAG 

DAYDVPRGVQFLEPPAETSE 

AAAGCAAACCCCCAGGAAAGGGATGGTGTTTATGATGTCCCTCTGCATAACCCGCCAGAT 

KANPQERDGVYDVPLHNPPD 

GCTAAAGGCTCTCGGGACTTGGTGGATGGGATCAACCGATTGTCTTTCTCCAGTACAGGC 

AKGS RDLVDGINR.LSFSSTG 

AGCACCCGGAGTAACATGTCCACGTCTTCCACCTCCTCCAAGGAGTCCTCACTGTCAGCC 

STRSNMSTSSTSSKESSLSA 

TCCCCAGCTCAGGACAAAAGGCTCTTCOTGGATCCAGAGACAGCTATTGAGAGACTTCAG 

SPAQDKRLFLDPDTAIERLQ 

CGGCTCCAGCAGGCCCTTGAGATGGGTGTCTCCAGCCTAATGGCACTGGTCACTACCGAC 

RLQQALEMGVSSLMALVTTD 

TGGCGGTGTTACGGATATATGGAAAGACACATCAATGAAATACGCACAGCAGTGGACAAG 

WRCYGYMERHINEIRTAVDK 

GTGGAGCTGTTCCTGAAGGAGTACCTCCACTTTGTCAAGGGAGCTGTTGCAAATGCTGCC 

VELFLKEYLHFVKGAVANAA 

TGCCTCCCGGAACTCATCCTCCACAACAAGATGAAGCGGGAGCTGCAACGAGTCGAAGAC 

CLPELILHNKMKRELQRVED 

TCCCACCAGATCCTGAGTCAAACCAGCCATGACTTAAATGAGTGCAGCTGGTCCCTGAAT 

SHQILSQT SHDLNECSWSLN 

ATCTTGGCCATCAACAAGCCCCAGAACAAGTGTGACGATCTGGACCGGTTTGTGATGGTG 

ILAINKPQNKCDDLDRFVMV 

GCAAAGACGGTGCCCGATGACGCCAAGCAGCTCACCACAACCATCAACACCAACGCAGAG 

AKTVPDDAKQLTTTINTNAE 

GCCCTCTTCAGACCCGGCCCTGGCAGCTTGCATCTGAAGAATGGGCCGGAGAGCATCATG 

ALFRPGPGSLHLKNGPESIM 

Figure IB 
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AACTCAACGGAGTACCCACACGGTGGCTCCCAGGGACAGCTGCTGCATCCTGGTGACCAC 

NSTEYPHGGSQGQLLHPGD'H 

AAGGCCCAGGCCCACAACAAGGCACTGCCCCCAGGCCTGAGCAAGGAGCAGGCCCCTGAC 

KAQAHNKALP'PGLSKEQAPD 

TGTAGCAGCAGTGATGGTTCTGAGAGGAGCTGGATGGATGACTACGATTACGTCCACCTA 

CSSS.D.GSERSWMDDYDYVHL 

CAGGGTAAGGAGGAGTTTGAGAGGCAACAGAAAGAGCTATTGGAAAAAGAGAATATCATG 

QGKEEFERQQKEL LEKENIM 

AAACAGAACAAGATGCAGCTGGAACATCATCAGCTGAGCCAGTTCCAGCTGTTGGAACAA 

KQNKMQLEHHQLSQFQ- LLEQ 

GAGATTACAAAGCCCGTGGAGAATGACATCTCGAAGTGGAAGCCCTCTCAGAGCCTACCC 

EITKPVENDISKWK PSQSLP 

ACCACAAACAGTGGCGTGAGTGCTCAGGATCGGCAGTTGCTGTGCTTCTACTATGACCAA 

TTNSGVSAQDRQLLCFYYDQ 

TGTGAGACCCATTTCATTTCCCTTCTCAACGCCATTGACGCACTCTTCAGTTGTGTCAGC 

CETHF ISLLNAIDALFSCVS 

TCAGCCCAGCCCCCGCGAATCTTCGTGGCACACAGCAAGTTTGTCATCCTCAGTGCACAC 
SAQPPRIFVAHSKFVILSAH 

AAACTGGTGTTCATTGGAGACACGCTGACACGGCAGGTGACTGCCCAGGACATTCGCAAC 

KLVF IGDTLTRQVTAQDIRN 

AAAGTCATGAACTCCAGCAACCAGCTCTGCGAGCAGCTCAAGACTATAGTCATGGCAACC 

KVMNSSNQLCEQLKTIVMAT 

AAGATGGCCGCCCTCCATTACCCCAGCACCACGGCCCTGCAGGAAATGGTGCACCAAGTG 

KMAALHYP STTALQEMVHQV 

ACAGACCTTTCTAGAAATGCCCAGCTGTTCAAGCGCTCTTTGCTGGAGATGGCAACGTTC 

TDLSRNAQLFKRSLLEMATF 

TGAGAAGAAAAAAAAGAGGAAGGGGACTGCGTTAACGGTTACTAAGGAAAACTGGAAATA 
* 

CTGTCTGGTTTTTGTAAATGTTATCTATTTTTGTAGATAATTTTATATAAAAATGAAATA 
TTTTAACATTTTATGGGTCAGACAACTTTCAGAAATTCAGGGAGCTGGAGAGGGAAATCT 
■ TTTTTTCCCCCCTGAGTXGTTCTTATGTATACACAGAAGTATCTGAGACATAAACTGTAC 
AGAAAACTTGTCCACGTCCTTTTGTATGCCCATGTATTCATGTTTTTGTTTGTAGATGTT 

Figure 1C 
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TGTCTGATGCATTTCATTAAAAAAAAAACCATGAATTACGAAGCACCTTAGTAAGCACCT 
TCTAATGCTGCATTTTTTTTGTTGTTGTTAAAAACATCCAGCTGGTTATAATATTGTTCT 
CCACGTCCTTGTGATCATTCTGAGCCTGGCACTGGGAATCTGGGAAGCATAGTTTATTTG 
CAAGTGTTCACCTTCCAAATCATGAGGCATAGCATGACTTATTCTTGTTTTGAAAACTCT 
TTTCAAAACTGACCATCTTAAACACATGATGGCCAAGTGCCACAAAGCCCTCTTGCGGAG 
ACATTTACGAATATATATGTGGATCCAAGTCTCGATAGTTAGGCGTTGGAGGGAAGAGAG 
ACCAGAGAGTTTAGAGGCCAGGACCACAGTTAGGATTGGGTTGTTTCAATACTGAGAGAC 
AGCTACAATAAAAGGAGAGCAATTGCCTCCCTGGGGCTGTTCAATCTTCTGCATTTGTGA 
GTGGTTCAGTCATGAGGTTTTCCAAAAGATGTTTTTAGAGTTGTAAAAACCATATTTGCA 
GCAAAGATTTACAAAGGCGTATCAGACTATGATTGTTCACCAAAATAGGGGAATGGTTTG 
ATCCGCCAGTTGCAAGTAGAGGCCTTTCTGACTCTTAATATTCACTTTGGTGCTACTACC 
CCCATTACCTGAGGAACTGGCCAGGTCCTTGATCATGGAACTATAGAGCTACCAGACATA 
TCCTGCTCTCTAAGGGAATTTATTGCTATCTTGCACCTTCTTTAAAACTCAAAAAACATA 
TGCAGACCTGACACTCAAGAGTGGCTAGCTACACAGAGTCCATCTAATTTTTGCAACTTC 
CCCCCCCGAATTC 



Figure ID 
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